Tuple Merging in Probabilistic Databases
نویسندگان
چکیده
Real-world data are often uncertain and incomplete. In probabilistic relational data models uncertainty can be modeled on two levels. First by representing the uncertain instance of a tuple by a set of possible instances and second by assigning each tuple with its degree of membership to the considered relation. To overcome incompleteness, data from multiple sources need to be combined. In order to combine data from autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There has been only less attention on the integration of uncertain (esp. probabilistic) source data so far. In this paper, we consider probabilistic tuple merging being an essential step in the integration of probabilistic data. We present techniques for merging uncertain instance data as well as for merging different degrees of tuple membership.
منابع مشابه
Making massive probabilistic databases practical
Existence of incomplete and imprecise data has moved the database paradigm from deterministic to probabilistic information. Probabilistic databases contain tuples that may or may not exist with some probability. As a result, the number of possible deterministic databases that can be instances of a probabilistic database grows exponentially with the number of probabilistic tuples. In this paper,...
متن کاملSymmetry in Probabilistic Databases
Researchers in databases, AI, and machine learning, have all proposed representations of probability distributions over relational databases (possible worlds). In a tuple-independent probabilistic database, the possible worlds all have distinct probabilities, because the tuple probabilities are distinct. In AI and machine learning, however, one typically learns highly symmetric distributions, w...
متن کاملOn Entity Resolution for Probabilistic Data
Entity resolution (ER) is the problem of identifying duplicate tuples, which are the tuples that represent the same real-world entity. There are many real-life applications in which the ER problem arises. These applications range from news aggregation websites, identifying the news that cover the same story, in order to avoid presenting one story several times to the user, to the integration of...
متن کاملA Probabilistic NF2 Relational Algebra for Imprecision in Databases
We present a probabilistic data model which is based on relations in non-rst-normal-form (NF2). Here, tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. This way, imprecise attribute values are modelled as a probabilistic subrelation. For information retrieval, the set of weighted index terms of a document can be represented in the same way, thu...
متن کاملSensitivity Analysis and Explanations for Robust Query Evaluation in Probabilistic Databases
Probabilistic database systems have successfully established themselves as a tool for managing uncertain data. However, much of the research in this area has focused on efficient query evaluation and has largely ignored two key issues that commonly arise in uncertain data management: First, how to provide explanations for query results, e.g., “Why is this tuple in my result ?” or “Why does this...
متن کامل